Multiple Imputation by Chained Equations

MICE vs. Fully Conditional Specification vs. Iterative Imputation

MICE belongs to the Fully Conditional Specification (FCS) framework (Handling Missing Data#Fully Conditional Specification (FCS))

MICE is generally the statistical method, while Scikit-learn’s IterativeImputer is a Python implementation.

How MICE works

  1. start with initial values: filling the missing values with initial values (e.g. mean)
  2. iterate over each feature:
    1. Predict X1 using X2, X3 -> Replace missing X1
    2. Predict X2 using X1, X3 -> Replace missing X2
    3. Predict X3 using X1, X2 -> Replace missing X3
  3. Repeat this cycle multiple times until convergence
  4. Add randomness in predictions
  5. Repeat whole process  for a number of iterations

Why “Multiple” Imputation

Key steps

  1. create N different complete datasets
  2. each dataset has slightly different plausible imputed values
  3. train/analyze each dataset separately
  4. combine results (average coefficients, adjust variance)

Why “Chained Equations”?

Why use MICE

Pros

Cons